-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: list-like to_replace on Categorical.replace is ignored or crash #31734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,44 @@ | |||
import pandas as pd | |||
import pandas.testing as tm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this behave when In [5]: s
Out[5]:
0 0
1 1
2 2
dtype: category
Categories (3, int64): [0, 1, 2]
In [6]: s.replace({0: 1})
Out[6]:
0 1
1 1
2 2
dtype: int64 |
@dsaxton this is because in |
Thanks @JustinZhengBC ! Overall looks good |
(1, 2, [2, 2, 3], True), | ||
(1, 4, [4, 2, 3], True), | ||
(4, 1, [1, 2, 3], True), | ||
(5, 6, [1, 2, 3], True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the case when to_replace
and value
are different dtypes need testing? e.g. with to_replace=1
, value="4"
EDIT
actually, maybe that's outside the scope of the issue this PR is addressing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doable, with limited dtype-checking as a full check would involve the category index which causes a crash on comparison on mixed dtypes
([1, 4], [5, 2], [5, 2, 3], False), | ||
], | ||
) | ||
def test_replace(to_replace, value, expected, check_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the issue number here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/_testing.py
Outdated
@@ -1104,6 +1105,8 @@ def assert_series_equal( | |||
Compare datetime-like which is comparable ignoring dtype. | |||
check_categorical : bool, default True | |||
Whether to compare internal Categorical exactly. | |||
check_category_order : bool, default True | |||
Whether to compare category order internal Categoricals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a missing "of" here? Should it be
Whether to compare category order of internal Categoricals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/core/arrays/categorical.py
Outdated
def replace_list(self, to_replace, value, inplace: bool = False): | ||
return self.replace(to_replace, value, inplace) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this and the change below it were experiments in addressing the dict case and list-list case for categoricals, which turned out to be more complex than expected and are not intended to be in this pr
pandas/core/arrays/categorical.py
Outdated
if is_list_like(to_replace): | ||
if is_list_like(value): | ||
if len(to_replace) != len(value): | ||
raise ValueError("to_replace and value must be same length!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this ValueError be a bit more precise? As in, include something about the lengths of to_replace
and value
which were received
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this and the change above it were experiments in addressing the dict case and list-list case for categoricals, which turned out to be more complex than expected and are not intended to be in this pr
Hello @JustinZhengBC! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-02-17 00:22:52 UTC |
([1, 2, "3"], "5", ["5", "5", 3], True, False), | ||
], | ||
) | ||
# GH 31720 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue number usually goes inside the test:
def test_me(*args):
# GH 1234
Probably not worth changing on its own, but if you get any other comments requesting changes, perhaps this is worth doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -99,7 +99,7 @@ Bug fixes | |||
Categorical | |||
^^^^^^^^^^^ | |||
|
|||
- | |||
- Bug in :class:`Categorical` that would ignore or crash when calling :meth:`Series.replace` with a list-like ``to_replace`` (:issue:`31720`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is tagged with v1.0.2 milestone, I think this should go in that whatsnew file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. move the note to 1.0.2
10fd29e
to
fd0d736
Compare
@jreback I've moved the whatsnew and addressed the other feedback |
also pls merge master |
fd0d736
to
7d3c1a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor comments, ping on green.
@jreback green |
thanks @JustinZhengBC very nice! |
…l.replace is ignored or crash
…is ignored or crash (#32062) Co-authored-by: Justin Zheng <[email protected]>
Series.replace
fails on categoricals with list #31720black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Covers the case where
to_replace
is a list-like andvalue
is a string. Other cases, like "to_replace
is dict andvalue
is None", or "to_replace
andvalue
are both lists" are handled earlier in generic.py